-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Add text-to-speech functionality #1412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
What are the chances of adding some speech to text functionality in there while you're at it? ;) |
|
I would like to add speech to text in a separate pull request, plus more settings for voice volume, speed, voice type. If there is interest of course :-) |
|
There absolutely is |
|
Turns out adding speech to text into Roo is less straightforward than I thought it would be. I think using something like whisper would provide the best quality local solution, but bundling this with Roo is not easy. |
| // skip input message | ||
| if (lastMessage && messages.length > 1) { | ||
| let text = lastMessage?.text || "" | ||
|
|
||
| if ( | ||
| lastMessage.type === "say" && // is a say message | ||
| !lastMessage.partial && // not a partial message | ||
| !text.startsWith("{") && // not a json object | ||
| text !== lastTtsRef.current // not the same as last TTS message | ||
| ) { | ||
| try { | ||
| playTts(text) | ||
| lastTtsRef.current = text | ||
| } catch (error) { | ||
| console.error("Failed to execute text-to-speech:", error) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind explaining the logic here? Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, essentially I only want to read out the messages which the user would expect Roo to read, ie: messages which appear in the chat interface. The first item in messages is the user input, which we don't need to read aloud. We also don't need to read aloud incomplete messages or json objects. The reason that I had it check if the message type is say is that I didn't want Roo reading aloud ask messages such as this:
Maybe this last behavior should be a toggleable option though?
The code also stores a reference to the last spoken message to prevent duplicate responses from being read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pulled the branch down and it did seem to read my messages back to me - is that unintended? It does also still seem to read mermaid and json.
Pretty cool experience overall though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heyseth sorry I accidentally resolved this conversation somehow. Any ideas on my last question here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrubens sorry for the late response! I'm writing some fixes now that should prevent the mermaid diagrams/json and user input messages from being read aloud
| filePaths, | ||
| openedTabs, | ||
| soundVolume: state.soundVolume, | ||
| ttsSpeed: state.ttsSpeed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need ttsEnabled in here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, the contextValue object is built using the spread operator on the whole state (which already includes ttsEnabled). I only put ttsSpeed in because I noticed that soundVolume was there, but it looks like neither of those are actually needed.
|
What changes remain to be made to make this production ready? |
I asked a question in a thread above that I accidentally resolved earlier - sorry about that. |
|
@mrubens I've modified the logic in ChatView.tsx for reading aloud messages. It should only read aloud regular messages from Roo now, skipping over user input messages, json objects, and mermaid diagrams. Would you mind testing the feature again? |
Great! Will take a look tonight. |
mrubens
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, seems to work great now!

Context
Text-to-speech is a useful quality-of-life feature that allows users to listen to what Roo is doing while having another window open. This feature is also useful for auditory learners (such as myself), because it adds an alternative way for the user to absorb information without needing to read what Roo is doing.
Implementation
Screenshots
How to Test
Get in Touch
Message me in the Roo Code Discord at the handle @ocean.smith
Important
Adds text-to-speech functionality using
say.js, with settings and state management for enabling/disabling TTS, and integrates it into message handling.say.jsinsrc/utils/tts.ts.ChatView.tsxto read aloud non-partial, non-JSONsaymessages.NotificationSettings.tsxandSettingsView.tsx.ExtensionStateContext.tsxto includettsEnabledstate management.ClineProvider.tsto handle TTS state and message typesttsEnabledandplayTts.ClineProvider.test.tsto verify enabling/disabling TTS and message handling.This description was created by
for 0d4a743. It will automatically update as commits are pushed.